Pytraj, nglview: bring AmberTools to Python ecosystem

In [1]:
from IPython import display
display.Image('team.png', width=700)
Out[1]:

Slides: github.com/hainm/amber_meeting_2016

pytraj: github.com/Amber-MD/pytraj

Workflow

In [2]:
display.Image('./old_work_flow.jpg')
Out[2]:
  • Simpler workflow?

  • Not a professional programmer but still want to write cool analysis in Python and make it as fast as c++? (ask Dan?)

  • Read new trajectory format that cpptraj does not support? (write XTC file parser with 3 lines of codes?)

  • Write parallel code for analysis? (Hint: how we can make parallel calculation with only one line of code?)

  • Interface with other programs? (pyrosetta, rdkit, cclib, …)

Is Python really slow?

Nope

from numba import jit
from numpy import arange

# jit decorator tells Numba to compile this function.
# The argument types will be inferred by Numba when function is called.
@jit # make the code as fast as C/C++
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result
In [3]:
# data: http://www.amber.utah.edu/AMBER-workshop/London-2015/DNA-tutorial/
import pytraj as pt

traj0 = pt.load('md.nc', 'dna.prmtop')
traj0
Out[3]:
pytraj.Trajectory, 831 frames: 
Size: 0.298563 (GB)
<Topology: 16074 atoms, 5144 residues, 5122 mols, PBC with box type = truncoct>
           
In [4]:
traj = traj0.autoimage()['!:WAT']
traj
Out[4]:
pytraj.Trajectory, 831 frames: 
Size: 0.014488 (GB)
<Topology: 780 atoms, 46 residues, 24 mols, PBC with box type = truncoct>
           
In [5]:
# compute rmsd and convert raw data to pandas' DataFrame
data = pt.rmsd(traj, ref=0, mask=':1-24&!@H=', dtype='dataframe')
data.head(10)
Out[5]:
RMSD_00001
0 2.447129e-07
1 6.686739e-01
2 7.146302e-01
3 8.289533e-01
4 9.041873e-01
5 9.124143e-01
6 9.056889e-01
7 9.573107e-01
8 9.137387e-01
9 1.095286e+00
In [6]:
%matplotlib inline
data.plot()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x2aaad73688d0>
In [8]:
data.hist()
Out[8]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x2aaae10ae160>]], dtype=object)

Parallel

  • openmp (through cpptraj)
  • multiprocessing
  • mpi
  • mpi + openmp?
  • cpptraj, cpptraj.OMP, cpptraj.MPI: Nope
In [9]:
traj = pt.iterload(['tz2.0.nc', 'tz2.1.nc', 'tz2.2.nc'], 'tz2.parm7')
# multiprocessing
data = pt.pmap(pt.rmsd, traj, ref=0, mask='@CA', n_cores=8)

# MPI
# data = pt.pmap_mpi(pt.rmsd, traj, ref=0, mask='@CA')
# run: mpirun -n 8 python your_code.py

# serial: data = pt.rmsd(traj, ref=0, mask='@CA')

Crazy idea: Write code in serial and make it run in parallel

def method(traj, mask, ...):
    for frame in traj:
        dosome_thing_fun(frame)
        ...
    return data

pt.pmap(method, traj, mask, ..., n_cores=8)

# TB of data? no problem
# write serial code in Cpptraj? no problem

Parallel scaling

In [10]:
from IPython import display
display.Image('bench_pmap_casegroup.png', width=600)
Out[10]:

Intergrate will pysander (Python interface for sander)

In [11]:
traj2 = pt.iterload('tz2.nc', 'tz2.parm7')
energies = pt.energy_decomposition(traj2, igb=8, dtype='dataframe')
energies[['bond', 'angle', 'dihedral', 'gb']].head(6)

# wanna make the caculation faster?
# energies = pt.pmap(pt.energy_decomposition, traj2, igb=8, n_cores=8)
Out[11]:
bond angle dihedral gb
0 0.015314 128.545148 111.611329 -412.532664
1 0.013582 105.064945 105.392413 -400.090422
2 0.012521 103.520284 93.030850 -439.927013
3 0.016334 94.560780 105.522288 -400.956276
4 0.013338 99.508124 105.850222 -404.061030
5 0.015662 95.670638 109.626304 -389.559778

Writing new file parser with 3 lines of code?

In [12]:
# XTC
# amber.conda install mdtraj # 10 second to install
import mdtraj as md

t0 = md.load('monolayer.xtc', top='monolayer.pdb')

coordinates = t0.xyz.astype('f8')
traj = pt.Trajectory(xyz=coordinates, top='monolayer.pdb')
pt.center_of_mass(traj)
Out[12]:
array([[ 2.93015586,  2.60094379,  1.66991936],
       [ 3.01727943,  2.57976876,  1.59680848],
       [ 3.02472708,  2.57837296,  1.58989385],
       ..., 
       [ 3.0366626 ,  2.57732852,  1.5900617 ],
       [ 3.08111659,  2.58228309,  1.59082286],
       [ 3.08029504,  2.58351248,  1.58572677]])

Document?

Non official manual: amber-md.github.io/pytraj

In [13]:
help(pt.calc_center_of_mass)
Help on function calc_center_of_mass in module pytraj.all_actions:

calc_center_of_mass(traj=None, mask='', top=None, dtype='ndarray', frame_indices=None)
    compute center of mass
    
    Examples
    --------
    >>> import pytraj as pt
    >>> traj = pt.datafiles.load_tz2()
    >>> # compute center of mass residue 3 for first 2 frames.
    >>> pt.calc_center_of_mass(traj(stop=2), ':3')
    array([[-0.661702  ,  6.69124347,  3.35159413],
           [ 0.5620708 ,  7.82263042, -0.72707798]])

NGLVIEW - Trajectory viewer in Jupyter notebook

  • light viewer for Jupyter notebook (~2 MB source code)
  • beautiful rendering
  • fast (WebGL)
  • super easy to install
    amber.pip install nglview
    
  • able to view TB of data (I mean it, but should use netcdf format)
  • able to plug any backends (pytraj, mdtraj, ParmEd, ...)
In [14]:
display.Image('./3pqr_nglview.png')
Out[14]:
In [15]:
display.Image('./trpzip2_nglview.png')
Out[15]: